The Gaussian distribution, also known as the normal distribution, is a continuous probability distribution that is symmetric about its mean. It describes how values of a variable are distributed in many natural phenomena such as heights, test scores, and measurement errors.
The probability density function (PDF) of the normal distribution is given by:
$$ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}} $$
There is no difference between a Gaussian distribution and a normal distribution. Both terms refer to the same concept. The term "Gaussian" comes from the mathematician Carl Friedrich Gauss, who studied this distribution in depth. "Normal" is a more general term used in statistics to describe its common appearance in natural data.
In summary, they are two names for the same bell-shaped distribution:
To choose how many intervals (bins) to use and their width when grouping continuous data from a normal distribution, several statistical rules are commonly used.
Useful for approximately normal distributions and medium-sized datasets.
Number of intervals:
\[ k = 1 + \log_2(n) \]
Width of each interval:
\[ h = \frac{\max(x) - \min(x)}{k} \]
Number of intervals:
\[ k = \sqrt{n} \]
Width:
\[ h = \frac{\max(x) - \min(x)}{k} \]
Scott’s rule minimizes estimation error for normal distributions.
Interval width:
\[ h = \frac{3.5 \,\sigma}{n^{1/3}} \]
Number of intervals:
\[ k = \frac{\max(x)-\min(x)}{h} \]
Uses IQR instead of standard deviation, making it robust to outliers.
Interval width:
\[ h = \frac{2 \cdot IQR}{n^{1/3}} \]
Number of intervals:
\[ k = \frac{\max(x) - \min(x)}{h} \]
| Your Situation | Best Method |
|---|---|
| Data is normal + medium/large sample | Scott |
| Data is normal + small/medium sample | Sturges |
| Simple/quick binning | Square root |
| Outliers or heavy tails | Freedman–Diaconis |
import numpy as np
x = np.array(values)
n = len(x)
sigma = np.std(x, ddof=1)
h = 3.5 * sigma / (n ** (1/3))
k = int(np.ceil((x.max() - x.min()) / h))
print("Number of bins =", k)
print("Width =", h)
Number of intervals:
\[ k = \begin{cases} 1 + \log_2(n) & \text{Sturges} \\ \sqrt{n} & \text{Square root} \\ \frac{\max(x)-\min(x)}{3.5\sigma n^{-1/3}} & \text{Scott} \\ \frac{\max(x)-\min(x)}{2\,IQR\,n^{-1/3}} & \text{F-D} \end{cases} \]
Interval width:
\[ h = \frac{\max(x)-\min(x)}{k} \]
If you want, I can calculate the bins for your dataset or generate a helper function.